1 visR by Example

1.1 Alluvial plot

Alluvial/Sankey plots which are often used for assessing patterns in patient flow, e.g., consecutively received treatments. The function ‘vr_alluvial_plot’ is wrapper based on easyalluvial and parcats packages, to facilitate this task. Short tutorial below.

devtools::install_github("https://github.com/openpharma/visR.git", ref = "develop")


To use this function we need to have data, which contain variables: patients ids, treatments names and lines of therapy numbers. Let’s simulate such dataset.

dataset <- NULL

for(PatientID in 1:5000){
  
  max_line <- sample(0:5, 1, prob = c(0.2, 0.4, 0.2, 0.15, 0.04, 0.01))
  
  if(max_line == 1){ 
    
    min_line <- sample(c(0, 1), 1, prob = c(0.3, 0.7)) 
    
  } else{ min_line <- 0 }
  
  for(LineNumber in min_line:max_line){
    
    LineName <- sample(
      c('Treatment_A', 'Treatment_B', 'Treatment_C', 'Treatment_D' ), 
      1, 
      prob = c(0.5, 0.3, 0.1, 0.1)
    )
    
    patient_data_line <- data.frame(PatientID = PatientID,
                                    LineName = LineName,
                                    LineNumber = LineNumber)
    
    dataset <- rbind(patient_data_line , dataset)
    
  }
  
}


When we have any dataset, minimally what we need to pass to function in parameters are variable names corresponding to patients ids, treatments names and lines of therapy numbers like that:

vr_alluvial_plot(dataset,
                 id = "PatientID",
                 linename = "LineName",
                 linenumber = "LineNumber",
                 data_source = "Simulatation")
Alluvial plot for simulated dataset

Figure 1.1: Alluvial plot for simulated dataset


If you will use our package to work with Flatiron database, there is good information for you. You do not need to pass variable names because names existing in Flatiron database are taken as default, it means: (“PatientID”, “LineName”, “LineNumber”). In this case you can simply do:

vr_alluvial_plot(dataset)


Except discussed already parameters there are other important parameters. Here are all of them:

#' @param n_common Number of most common lines of therapy presented in alluvial plot, Default: 2 
#' @param title plot title, Default: NULL
#' @param interactive, interactive plot, Default: FALSE
#' @param N_unit, Default: 'patients' 
#' @param data_source data source name, Default: 'Flatiron'
#' @param fill_by one_of(c('first_variable', 'last_variable', 'all_flows', 'values')), Default: 'first_variable'
#' @param col_vector_flow HEX color values for flows, Default: easyalluvial::palette_filter( greys = F)
#' @param col_vector_value HEX color values  for y levels/values, Default:RColorBrewer::brewer.pal(9, 'Blues')[c(3,6,4,7,5,8)]
#' @param linenames_labels_size, Default: 2.5


Parameter ‘interactive’ gives possibility of seeing interactive version of plot. But remember, this is html vidget and it will not take into account parameters like ‘title’ or ‘data_source’.

vr_alluvial_plot(
  dataset,
  id = "PatientID",
  linename = "LineName",
  linenumber = "LineNumber",
  interactive = TRUE
  )

Figure 1.2: Alluvial plot for simulated dataset


1.2 Alluvial data wrangling

When you use ‘vr_alluvial_plot’, in the backgroun there is working ‘vr_alluvial_wrangling’ function which is responsible for finding most common treatments in each line of therapy and death (no later treatment) censoring.

You can receive ‘vr_alluvial_wrangling’ output data by:

result <- vr_alluvial_wrangling(dataset,
                                id = "PatientID",
                                linename = "LineName",
                                linenumber = "LineNumber",
                                n_common = 2)

output <- result$alluvial_plot_data

knitr::kable(output[1:10,]) %>%
  kableExtra::kable_styling(full_width = TRUE,
                            position = "center")
id linename linenumber
1 Other Tx 0
1 Treatment_A 1L 1
1 Treatment_B 2L 2
1 No Tx 3L 3
1 No Tx 4L 4
2 Treatment_B 0L 0
2 No Tx 1L 1
2 No Tx 2L 2
2 No Tx 3L 3
2 No Tx 4L 4


There is also a summary table:

output <- result$linenames_summary

knitr::kable(output[1:10,]) %>%
  kableExtra::kable_styling(full_width = TRUE,
                            position = "center")
linenumber linename patient_count freq
0 Treatment_A 0L 1743 0.3486
0 No Tx 0L 1423 0.2846
0 Treatment_B 0L 1143 0.2286
0 Other Tx 691 0.1382
1 Treatment_A 1L 1999 0.3998
1 Treatment_B 1L 1163 0.2326
1 No Tx 1L 969 0.1938
1 Other Tx 869 0.1738
2 No Tx 2L 2993 0.5986
2 Treatment_A 2L 1042 0.2084


Additionally if you need summary table for patients data before truncating data to n most common treatments then you can do it by:

output <- result$linenames_summary_long

knitr::kable(output[1:10,]) %>%
  kableExtra::kable_styling(full_width = TRUE,
                            position = "center")
linenumber linename patient_count freq
0 Treatment_A 0L 1743 0.3486
0 No Tx 0L 1423 0.2846
0 Treatment_B 0L 1143 0.2286
0 Treatment_C 0L 350 0.0700
0 Treatment_D 0L 341 0.0682
1 Treatment_A 1L 1999 0.3998
1 Treatment_B 1L 1163 0.2326
1 No Tx 1L 969 0.1938
1 Treatment_C 1L 457 0.0914
1 Treatment_D 1L 412 0.0824